{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "df19606e-d67e-44d2-bed0-4b804e6fc6c3",
   "metadata": {},
   "source": [
    "# Token Predictors\n",
    "\n",
    "Using our token predictors, we can predict the token usage of an operation before actually performing it.\n",
    "\n",
    "We first show how to predict LLM token usage with the MockLLMPredictor class, see below.\n",
    "We then show how to also predict embedding token usage."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f1a9eb90-335c-4214-8bb6-fd1edbe3ccbd",
   "metadata": {},
   "outputs": [],
   "source": [
    "# My OpenAI Key\n",
    "import os\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"INSERT OPENAI KEY\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a707fa6-d79e-4343-92fd-d0fadb25c466",
   "metadata": {},
   "source": [
    "## Using MockLLMPredictor"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "be3f7baa-1c0a-430b-981b-83ddca9e71f2",
   "metadata": {
    "tags": []
   },
   "source": [
    "#### Predicting Usage of GPT Tree Index\n",
    "\n",
    "Here we predict usage of TreeIndex during index construction and querying, without making any LLM calls.\n",
    "\n",
    "NOTE: Predicting query usage before tree is built is only possible with TreeIndex due to the nature of tree traversal. Results will be more accurate if TreeIndex is actually built beforehand."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "c0ef16d1-45ef-43ec-9aad-4e44e9bb8578",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index import (\n",
    "    TreeIndex,\n",
    "    MockLLMPredictor,\n",
    "    SimpleDirectoryReader,\n",
    "    ServiceContext,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "b2ecdadc-1403-4bd4-a876-f80e4da911ef",
   "metadata": {},
   "outputs": [],
   "source": [
    "documents = SimpleDirectoryReader(\"../paul_graham_essay/data\").load_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "11056808-fd7f-4bc6-9348-0605fb4ee668",
   "metadata": {},
   "outputs": [],
   "source": [
    "llm_predictor = MockLLMPredictor(max_tokens=256)\n",
    "service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9ea4ba66-9a09-4478-b0a8-dee8645fa4e3",
   "metadata": {},
   "outputs": [],
   "source": [
    "index = TreeIndex.from_documents(documents, service_context=service_context)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "345433c2-5553-4645-a513-0186b771a21f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "19495\n"
     ]
    }
   ],
   "source": [
    "print(llm_predictor.last_token_usage)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f43733ae-af35-46e6-99d9-8ba507acbb0d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# default query\n",
    "query_engine = index.as_query_engine(service_context=service_context)\n",
    "response = query_engine.query(\"What did the author do growing up?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "4ba19751-da2d-46af-9f8f-4f42871e65a0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "5493\n"
     ]
    }
   ],
   "source": [
    "print(llm_predictor.last_token_usage)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4324d85b-ae80-48ab-baf0-7dc160dfae46",
   "metadata": {},
   "source": [
    "#### Predicting Usage of GPT Keyword Table Index Query\n",
    "\n",
    "Here we build a real keyword table index over the data, but then predict query usage."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "10447805-38db-41b9-a2c6-b0c95437b276",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index import KeywordTableIndex, MockLLMPredictor, SimpleDirectoryReader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "8ca76e72-5f43-47c1-a9a4-c5c5db4f0f21",
   "metadata": {},
   "outputs": [],
   "source": [
    "documents = SimpleDirectoryReader(\"../paul_graham_essay/data\").load_data()\n",
    "index = KeywordTableIndex.from_documents(documents=documents)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "61f48870-65d2-4b23-b57e-79082ecb4ab2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "start token ct: 0\n",
      "> Starting query: What did the author do after his time at Y Combinator?\n",
      "query keywords: ['author', 'did', 'y', 'combinator', 'after', 'his', 'the', 'what', 'time', 'at', 'do']\n",
      "Extracted keywords: ['combinator']\n",
      "> Querying with idx: 3483810247393006047: of 2016 we moved to England. We wanted our kids...\n",
      "> Querying with idx: 7597483754542696814: people edit code on our server through the brow...\n",
      "> Querying with idx: 7572417251450701751: invited about 20 of the 225 groups to interview...\n",
      "end token ct: 11313\n",
      "> [query] Total token usage: 11313 tokens\n",
      "11313\n"
     ]
    }
   ],
   "source": [
    "llm_predictor = MockLLMPredictor(max_tokens=256)\n",
    "service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)\n",
    "query_engine = index.as_query_engine(service_context=service_context)\n",
    "response = query_engine.query(\"What did the author do after his time at Y Combinator?\")\n",
    "print(llm_predictor.last_token_usage)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0fee4405-05e0-46c2-87bb-64ec63a4c6c1",
   "metadata": {},
   "source": [
    "#### Predicting Usage of GPT List Index Query\n",
    "\n",
    "Here we build a real list index over the data, but then predict query usage."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "267f2213-67d1-4241-b73f-f1790661d06b",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index import ListIndex, MockLLMPredictor, SimpleDirectoryReader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "d553a8b1-7045-4756-9729-df84bd305279",
   "metadata": {},
   "outputs": [],
   "source": [
    "documents = SimpleDirectoryReader(\"../paul_graham_essay/data\").load_data()\n",
    "index = ListIndex.from_documents(documents=documents)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "69c99c68-6a23-48ed-aa41-e7af50fef2f3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "start token ct: 0\n",
      "> Starting query: What did the author do after his time at Y Combinator?\n",
      "end token ct: 23941\n",
      "> [query] Total token usage: 23941 tokens\n"
     ]
    }
   ],
   "source": [
    "llm_predictor = MockLLMPredictor(max_tokens=256)\n",
    "service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)\n",
    "query_engine = index.as_query_engine(service_context=service_context)\n",
    "response = query_engine.query(\"What did the author do after his time at Y Combinator?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "e8422c5c-af68-4138-a8dd-f6e8d7208c4c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "23941\n"
     ]
    }
   ],
   "source": [
    "print(llm_predictor.last_token_usage)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1e19cf61-6d6a-4dfa-af78-1ce184f41c6c",
   "metadata": {},
   "source": [
    "## Using MockEmbedding"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "106d86bf-7725-40bc-84ba-4f273493d3f6",
   "metadata": {},
   "source": [
    "#### Predicting Usage of GPT Simple Vector Index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "9baf0fe7-2c11-4233-a930-4e593433ba84",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index import (\n",
    "    VectorStoreIndex,\n",
    "    MockLLMPredictor,\n",
    "    MockEmbedding,\n",
    "    SimpleDirectoryReader,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "97023361-fa47-4008-b8d7-e66d60c5b263",
   "metadata": {},
   "outputs": [],
   "source": [
    "documents = SimpleDirectoryReader(\"../paul_graham_essay/data\").load_data()\n",
    "index = VectorStoreIndex.from_documents(documents=documents)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "63ebe021-2b9c-4024-95f8-56cd9e7e7c47",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "> [query] Total LLM token usage: 4374 tokens\n",
      "> [query] Total embedding token usage: 14 tokens\n"
     ]
    }
   ],
   "source": [
    "llm_predictor = MockLLMPredictor(max_tokens=256)\n",
    "embed_model = MockEmbedding(embed_dim=1536)\n",
    "service_context = ServiceContext.from_defaults(\n",
    "    llm_predictor=llm_predictor, embed_model=embed_model\n",
    ")\n",
    "query_engine = index.as_query_engine(\n",
    "    service_context=service_context,\n",
    ")\n",
    "response = query_engine.query(\n",
    "    \"What did the author do after his time at Y Combinator?\",\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "gpt_retrieve_venv",
   "language": "python",
   "name": "gpt_retrieve_venv"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
